Effects of stand age, moisture class, and fire severity on legacy C retention in boreal forest soils post burning event

A replication of data analysis presented in Walker et al (2019) ‘Increasing wildfires threaten historic carbon sink of boreal forest soils’.

Required Packages:

library(curl)
library(ggplot2)
library(lmtest)
library(effects)
library(broom)

Required Data:

s <- curl('https://raw.githubusercontent.com/vfrench-397/AN588_ReplicationAnalysis/main/Data/Radiocarbon_site_data.csv')
site.data <- read.csv(file = s, header = TRUE, stringsAsFactors = TRUE) #general site classifications for each plot (moisture class, stand age and stand age class)
head(site.data)
##   burned_site_plot latitude longitude elevation slope aspect moisture_johnstone
## 1          SS33-3A 60.93776 -117.3727   227.547     0  -9999              mesic
## 2          SS33-4A 60.94248 -117.3788   229.535     0  -9999    mesic-subhygric
## 3       ZF104-110A 63.10914 -113.9700   293.283     0  -9999          subhygric
## 4        ZF104-63B 63.06745 -113.7327   266.376     0  -9999    mesic-subhygric
## 5        ZF104-83B 63.04578 -114.0243   270.940     0  -9999              mesic
## 6         ZF20-10A 61.80457 -116.7246   238.162     0  -9999     mesic-subxeric
##   moisture_class stand_age_rings stand_age_class residual_sol_depth burn_depth
## 1          Moist              90      old-burned              13.55       9.53
## 2            Wet             105      old-burned              19.40       6.83
## 3            Wet              56    young-burned              24.90      10.00
## 4            Wet              40    young-burned              10.70       7.96
## 5          Moist              54    young-burned               5.95      13.76
## 6          Moist              84      old-burned               7.95       5.31
##   prefire_sol_depth bg_c_residual bg_c_combusted prefire_bg_c tree_density
## 1             23.08       6970.04        3821.72     10791.76         3.13
## 2             26.23       8754.96        2929.55     11684.50         0.60
## 3             34.90       6958.69        2674.51      9633.20         0.83
## 4             18.66       3309.66        3199.05      6508.71         0.88
## 5             19.71       3216.49        4133.52      7350.01         0.63
## 6             13.26       2242.22        2092.95      4335.16         1.93
##   tree_c_prefire
## 1        2938.43
## 2        3125.83
## 3         549.25
## 4         163.68
## 5        1129.18
## 6        2414.58
d <- curl('https://raw.githubusercontent.com/vfrench-397/AN588_ReplicationAnalysis/main/Data/Soil_radiocarbon_data.csv')
soil.data <- read.csv(file = d, header = TRUE, stringsAsFactors = TRUE) #radioactive C data for site soil depth increments and atmosphere at the time of stand establishment
head(soil.data)
##   burned_site_plot latitude longitude sample_transect sample_monolith
## 1          SS33-3A 60.93776 -117.3727               0               1
## 2          SS33-3A 60.93776 -117.3727               0               1
## 3          SS33-3A 60.93776 -117.3727               0               1
## 4          SS33-3A 60.93776 -117.3727              12               2
## 5          SS33-3A 60.93776 -117.3727              12               2
## 6          SS33-3A 60.93776 -117.3727              12               2
##   site_sample_transect soil_depth_increment increment_location
## 1            SS33-3A.0                    2                TOP
## 2            SS33-3A.0                   12             BOTTOM
## 3            SS33-3A.0                    1                TOP
## 4           SS33-3A.12                   20             BOTTOM
## 5           SS33-3A.12                    1                TOP
## 6           SS33-3A.12                    2                TOP
##   bottom_sol_material field_sample_id uciams_id Delta14C std_deviation
## 1             mineral            RC86    190142     81.2           2.1
## 2             mineral            RC90    190133     40.9           2.4
## 3             mineral            RC85    190141     71.4           2.1
## 4             mineral            RC30    192176     46.3           1.5
## 5             mineral            RC25    192199     37.9           2.0
## 6             mineral            RC26    192175    111.2           1.5
##   soil_class_pre.post_bomb stand_age_rings stand_year X14C_yr_stand_age
## 1                     post              90       1924            -12.88
## 2                     post              90       1924            -12.88
## 3                     post              90       1924            -12.88
## 4                      pre              90       1924            -12.88
## 5                     post              90       1924            -12.88
## 6                     post              90       1924            -12.88
##   stand_age_pre.post_bomb
## 1                     pre
## 2                     pre
## 3                     pre
## 4                     pre
## 5                     pre
## 6                     pre

Background

Boreal regions have extremely thick soil organic layers due to their relatively slow rates of decomposition. Meaning they can be extremely effective in storing and retaining large amounts of carbon, preventing C from being emitted to the atmosphere and contributing to the formation of greenhouse gases.

When wildfires sweep through boreal regions, they combust the exposed soil organic layer, releasing stored C to the atmosphere. Due to the thick nature of these soil organic layers (SOLs), combustion is restricted to a proportion of the entire SOL. The portion that is unaffected by the most recent burning event is expected to retain ‘legacy carbon’.

With increasing frequency and severity of wildfire events as a result of global warming, these legacy carbon pools are under threat.

The goal of the original paper was to assess legacy C loss in sites across the Northwest Territory, Canada by identifying sites that possessed legacy C before and retained their legacy C pools after a burning event in 2014. Sites present with Legacy C were identified (through radiocarbon dating) as sites with any organic soil C that is older than the stand age (time since previous burning event).

Experimental Design

In 32 plots \(\Delta^{14}C\) (parts per thousand difference between sample carbon-14:carbon-12 and the international standard ratio) was measured at several soil depth increments. The 32 plots were assigned a moisture class (based on their topography, drainage, presence of permafrost and their soil class) and a stand age class (young < 60 years since previous burning event, old > 70 years, based on the historic fire return interval of North American northwest coast boreal ecosystems).

Soil samples were dated (based on \(\Delta^{14}C\) values) relative to a known historic peak C level from nuclear bomb testing in the area (further known as bomb peak).

Site establishment was dated based on tree rings.

Hypotheses

  1. Legacy C is more likely to be present in wet ecosystems (high soil organic matter (SOM) C accumulation due to low decomposition rates and low burn severity). Legacy C is less likely to be present in dry ecosystems (low SOM accumulation and higher severity burns)
  2. Legacy C is more likely to combust in historically moist (moderate) ecosystems due to their larger legacy C pools (ability to harbor legacy C previously) and their higher burn risk under drought and increasing wildfire severity.
  3. Legacy C is more likely to combust in young stands because the natural protective layer of SOM has not yet re-accumulated.

Analysis outline

The original paper conducted Firth’s bias reduced logistic regression to assess whether the presence of legacy C could be predicted from stand age and the prefire SOL depth (which is acting as a proxy for moisture class), and whether the combustion of legacy C could be predicted from the stand age and the proportion of the SOL burned (which is acting as a proxy for fire severity). They used Firth’s logistic regression to reduce problems associated with highly predictive covariates, common in generalized linear model regressions using smaller sample sizes. I attempted the Firth’s bias reduced logistic regression using the {logistf} package as well as the standard logistic regression using the glm() function and neither produced significant results so I kept in the standard regression code to condense the size of this report.

The authors used AIC to test for interactions between covariates in which they found that the interaction term between predictor variables could be dropped for the final model. I found a similar result running log-likelihood tests so I did not replicate the AIC model selection.

Finally, the original authors explored mixed modeling using individual fires as a random effect but found the mixed modelling did not produce significantly different results than the logistic regression and so again I did not replicate this analysis.

Presence of Legacy C

To identify plots that possess legacy C, the \(\Delta^{14}C\) values of the basal organic soil layer were compared to \(\Delta^{14}CO_2\) values of the atmosphere during the year of stand establishment. Legacy C was considered present if the base of the SOL was older than the year of stand establishment (meaning \(\Delta^{14}C\) lower (having decayed for longer) than \(\Delta^{14}CO_2\)). Therefore the first step is to extract basal SOL measurements from the larger data set.

basal.layer <- soil.data[soil.data$increment_location == 'BOTTOM', ]
basal.layer <- basal.layer[duplicated(basal.layer$burned_site_plot) == FALSE,] 

Next we determine and label the presence/absence of legacy carbon for each plot using a for loop with an embedded if/else conditions.

basal.layer$LC <- NULL
for (i in 1:nrow(basal.layer)) {
  if (basal.layer$Delta14C[i] < basal.layer$X14C_yr_stand_age[i]) {
    basal.layer$LC[i] <- 'Present'
  }
  else {
    basal.layer$LC[i] <- 'Absent'
  }
}
sum(basal.layer$LC == 'Present') #number of sites determined to possess legacy C 
## [1] 19

Data Visualization

For visualization, we must manipulate the data, combining the carbon measurements of the basal soil layer and the atmosphere at time of stand establishment into a single variable, for the ggplot object.

basal.carbon <- basal.layer[,c('burned_site_plot', 'soil_depth_increment', 'Delta14C', 'soil_class_pre.post_bomb', 'stand_age_pre.post_bomb', 'LC')] #extract Delta14C and other associated site information
basal.carbon$type <- 'Soil.Base' #designate delta 14 C as class soil base 
names(basal.carbon) <- c('Plot', 'Depth', 'Delta.14.C', 'Soil.Class','Stand.Class','LC', 'Location')
basal.stand.age <- basal.layer[,c('burned_site_plot', 'soil_depth_increment', 'X14C_yr_stand_age', 'soil_class_pre.post_bomb', 'stand_age_pre.post_bomb', 'LC')]
basal.stand.age$type <- 'Stand.Age' #designate atm C as a class stand age 
names(basal.stand.age) <- c('Plot', 'Depth', 'Delta.14.C', 'Soil.Class', 'Stand.Class','LC', 'Location')
legacy.carbon <- rbind(basal.carbon, basal.stand.age)

Divide the legacy carbon data set into pre and post bomb peak sites.

legacy.carbon.pre <- legacy.carbon[legacy.carbon$Stand.Class == 'pre',]
legacy.carbon.post <- legacy.carbon[legacy.carbon$Stand.Class == 'post',]

Then we can plot basal layer \(\Delta^{14}C\) against atmospheric \(\Delta^{14}CO_2\) at time of stand establishment (pre and post bomb peak)

Figure 2 b and c from original paper

ggplot(legacy.carbon.pre, aes(x= Location, y = Delta.14.C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = Plot, linetype = LC), color = 'black') + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon')

Caption: Basal \(\Delta^{14}C\) (green circles) compared to atmospheric concentrations of \(^{14}C\) at the time of stand establishment (red triangles) in sites dated pre-bomb peak (pre 1966). Dashed lines represent sites where Legacy C is present. Solid lines represent sites where Legacy C was absent.

ggplot(legacy.carbon.post, aes(x= Location, y = Delta.14.C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = Plot, linetype = LC), color = 'black') + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') + scale_linetype_manual(values = 'dashed')

Caption: Basal \(\Delta^{14}C\) (green circles) compared to atmospheric concentrations of \(^{14}C\) at the time of stand establishment (red triangles) in sites dated post-bomb peak (post 1966). Dashed lines represent sites where Legacy C is present. Solid lines represent sites where Legacy C was absent.

We can also visualize the relationship between C and our expected predictor variables.

#Creating a data set that has both our predictor and response variables so we can plot the relationships and call on it during the regression.
log.r.data <- cbind(basal.layer, site.data)
log.r.data <- log.r.data[,-c(21,22)]
ggplot(data = log.r.data, aes(x = LC, y = stand_age_rings)) + geom_boxplot(aes(color = LC)) + theme_bw() + labs(title = 'Legacy C v. Stand Age', x = 'Legacy Carbon', y = 'Stand Age (yr)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none") + geom_hline(aes(yintercept = 60), linetype = 'dashed')

Caption: Legacy C class (presence or absence) plotted against the sites stand age (time since stand establishment, or time since previous burning event). The dashed horizontal line at 60 years represents the cut off point for age class (young or old). The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range). Legacy C presence (red) was more likely in younger stands where Legacy C was absent (green) in only older stands.

We can see from comparing the basal layer C levels of old and young plots that they have similar \(\Delta^{14}C\) levels which indicates they both underwent similar cycles of accumulation/combustion in previous burn cycles. Therefore the large proportion of young stands harboring legacy C could be due to the young soils not yet re-accumulating their C stocks.

ggplot(data = log.r.data, aes(x = stand_age_class, y = Delta14C)) + geom_boxplot(aes(color = stand_age_class)) + theme_bw() + labs(title = 'Stand Age Class v. Delta 14 C', x = 'Stand Age', y = 'Delta 14 C (parts per thousand)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none")

Caption: Stand age class (old = green and young = red) plotted against basal \(\Delta^{14}C\) levels to establish historic burn cycle effects on C combustion and accumulation. The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range).

ggplot(data = log.r.data, aes(x = LC, y = prefire_sol_depth)) + geom_boxplot(aes(color = LC)) + theme_bw() + labs(title = 'Legacy Carbon v. Prefire SOL Depth ', x = 'Legacy Carbon', y = 'Prefire SOL depth (cm)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none")

Caption: Legacy C class (presence = red and absence = green) plotted against prefire SOL depth (cm). The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range).

You can see Legacy C is present in all plots with prefire SOL depth > 30 cm which is indicative of wetter ecosystems. Using prefire SOL depth as a proxy for moisture class is appropriate, as we can see clear trends across prefire SOL depth when plotted against mositure class.

ggplot(data = log.r.data, aes(x = moisture_class, y = prefire_sol_depth)) + geom_boxplot(aes(color = LC)) + theme_bw() + labs(title = 'Moisture Class v. Prefire SOL depth', x = 'Moisture Class', y = 'Prefire SOL depth (cm)') + scale_color_manual(values = c('darkgreen', 'darkred', 'darkblue')) 

Caption: Moisture class (dry, moist and wet) plotted against the prefire SOL depth (cm). Each moisture class subset by the presence (red) or absence (green) of Legacy C. The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range). The prefire SOL depths are distinct for each moisture class indicating prefire SOL depth is an appropriate proxy for moisture class.

Multiple Logistic Regression

We will generate a generalized linear model for this data because we cannot assume a normal distribution of our binary response variable or its residuals and therefore cannot fit the assumptions of a general linear model. We use a logistic regression, a form of a generalized linear model, because our response variable is binary (presence or absence of legacy C) so we must use the logit link function to calculate the natural log of the odds ratio between our two possible outcomes and use that output as our response variable. When preparing the data to be placed into the glm() function we need to make sure our variables are factored to avoid fitting errors when running the regression.

log.r.data$LC <- as.factor(log.r.data$LC)
levels(log.r.data$LC)
## [1] "Absent"  "Present"

Model Selection

To evaluate the model significance we can compare model fit between a full, a reduced and a null model. In our case the full model will include the interaction terms between the predictor variables stand age and SOL depth. The reduced model will remove only the interaction term between the two predictor variables and the null model will be an intercept only model.

full.presence <- glm(data = log.r.data, formula = LC ~ prefire_sol_depth*stand_age_rings, family = binomial)
summary(full.presence)
## 
## Call:
## glm(formula = LC ~ prefire_sol_depth * stand_age_rings, family = binomial, 
##     data = log.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6056  -0.9864   0.5256   0.7918   1.8665  
## 
## Coefficients:
##                                    Estimate Std. Error z value Pr(>|z|)  
## (Intercept)                        2.714911   2.336077   1.162   0.2452  
## prefire_sol_depth                 -0.057961   0.109514  -0.529   0.5966  
## stand_age_rings                   -0.042048   0.025303  -1.662   0.0965 .
## prefire_sol_depth:stand_age_rings  0.001444   0.001190   1.214   0.2249  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.230  on 31  degrees of freedom
## Residual deviance: 34.501  on 28  degrees of freedom
## AIC: 42.501
## 
## Number of Fisher Scoring iterations: 5
reduced.presence <- glm(data = log.r.data, formula = LC ~ prefire_sol_depth + stand_age_rings, family = binomial)
summary(reduced.presence)
## 
## Call:
## glm(formula = LC ~ prefire_sol_depth + stand_age_rings, family = binomial, 
##     data = log.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6482  -1.0558   0.4442   0.8946   1.7336  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)  
## (Intercept)        0.169088   1.172339   0.144   0.8853  
## prefire_sol_depth  0.079485   0.044768   1.775   0.0758 .
## stand_age_rings   -0.014321   0.008787  -1.630   0.1031  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.230  on 31  degrees of freedom
## Residual deviance: 36.289  on 29  degrees of freedom
## AIC: 42.289
## 
## Number of Fisher Scoring iterations: 4
null.presence <- glm(data = log.r.data, formula = LC ~ 1, family = binomial)
summary(null.presence)
## 
## Call:
## glm(formula = LC ~ 1, family = binomial, data = log.r.data)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.342  -1.342   1.021   1.021   1.021  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   0.3795     0.3599   1.054    0.292
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.23  on 31  degrees of freedom
## Residual deviance: 43.23  on 31  degrees of freedom
## AIC: 45.23
## 
## Number of Fisher Scoring iterations: 4

We can compare the models using a log likelihood test which utilizes a ratio of the log-likelihoods as the test statistic and utilizes the Chi-squared distribution to calculate the p-values. Therefore we can run a log-likelihood test by arguing the two models to the anova() function and additionally including test = 'Chisq' argument.

anova(reduced.presence, full.presence, test = 'Chisq') 
## Analysis of Deviance Table
## 
## Model 1: LC ~ prefire_sol_depth + stand_age_rings
## Model 2: LC ~ prefire_sol_depth * stand_age_rings
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        29     36.289                     
## 2        28     34.501  1   1.7878   0.1812

Here the chi-squared p value is not significant so the inclusion of the interaction term does not significantly decrease deviance and we can resume with the reduced model.

We can also run a log-likelihood test using the lrtest() from the {lmtest} package

lrtest(reduced.presence, full.presence) 
## Likelihood ratio test
## 
## Model 1: LC ~ prefire_sol_depth + stand_age_rings
## Model 2: LC ~ prefire_sol_depth * stand_age_rings
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1   3 -18.145                     
## 2   4 -17.251  1 1.7878     0.1812

Here the higher log likelihood value indicates a greater model fit, but the chi-squared p value is not significant so even though the full model has a ‘better’ fit it is not significantly better. Therefore, again we can proceed with utilizing the reduced model for our analysis.

Next, we must compare the accepted model(the reduced model) against the null model

anova(null.presence, reduced.presence, test = 'Chisq') 
## Analysis of Deviance Table
## 
## Model 1: LC ~ 1
## Model 2: LC ~ prefire_sol_depth + stand_age_rings
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1        31     43.230                       
## 2        29     36.289  2   6.9407  0.03111 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here the chi-squared p value is less than alpha (.05) meaning the fuller model is associated with less residual deviance and therefore is a better fit! We can reject the null hypothesis that the removal of the predictor variables does not result in a loss of fit and proceed with utilizing the reduced model for our analysis.

Interpretation

Our slope estimate (\(\beta1\)) now represents the associated change in the response variable (which if we remember is the natural log of the odds ratio) with an increase in the predictor variable of 1 unit.

The coefficient/estimate for prefire SOL depth was positive. Therefore increasing prefire SOL depth results in an increased likelihood of legacy C being present. For every one cm of prefire SOL depth, the probability of legacy being present increases by 0.079485. Where the estimate for stand age is negative indicating every year the stand ages the likelihood of legacy C being present decreased by 0.014321. Even though neither of these relationships showed a significant p-value.

In order to get the actual odds ratio we must back transform the estimates using the reverse of the link function. We can find the inverse of the logit link function by pulling it from the model.

fam <- family(reduced.presence)
rev.logit <- fam$linkinv #eta function
SOL.depth.change <- rev.logit(reduced.presence$coefficients[2])
stand.age.change <- rev.logit(reduced.presence$coefficients[3])

SOL.depth.change
## prefire_sol_depth 
##         0.5198608
stand.age.change
## stand_age_rings 
##       0.4964197

1 unit increase in SOL depth results in a 52 % increase in the likelihood of Legacy C presence 1 unit increase in stand age results in a 49% decrease in the likelihood of Legacy C presence

Hypothesis Testing

To test our null hypothesis that the response variable (Legacy C presence) and our predictor variables have no relationship, we can calculate the Wald statistic (similar to a t-statistic) for each predictor variable and compare them to the z distribution.

# Calculating the Wald Statistic for SOL depth
modelresults <- tidy(reduced.presence)
wald <- modelresults$estimate[2]/modelresults$std.error[2]
p <- 2 * (1 - pnorm(wald))  # calculation of 2 tailed p value associated with the Wald statistic
p
## [1] 0.07581763
#Wald statistic for stand age
modelresults <- tidy(reduced.presence)
wald <- modelresults$estimate[3]/modelresults$std.error[3]
p <- 2 * (1 - pnorm(wald))  # calculation of 2 tailed p value associated with the Wald statistic
p
## [1] 1.896879

Again neither predictor variable seems to show a significant p-value.

Confidence Intervals

Finally, we can calculate the confidence intervals around our estimates

CI <- confint.default(reduced.presence, level = 0.95) # this function returns CIs based on standard errors instead of based on log-likelihood 
CI
##                          2.5 %      97.5 %
## (Intercept)       -2.128654035 2.466829843
## prefire_sol_depth -0.008258719 0.167229045
## stand_age_rings   -0.031542971 0.002900075
#Calculating the CIs for the actual odds ratios
SOL.depth.changeCI <- rev.logit(CI[2, ]) #for prefire SOL depth
stand.age.changeCI <- rev.logit(CI[3, ]) #for stand age 

SOL.depth.changeCI
##     2.5 %    97.5 % 
## 0.4979353 0.5417101
stand.age.changeCI
##     2.5 %    97.5 % 
## 0.4921149 0.5007250

Model Predictions

Before we create the ggplot object we can transform the factored response variable (presence) into a binary value for easier data plotting.

log.r.data$LC.bin <- NULL 
for (i in 1:nrow(log.r.data)) {
  if (log.r.data$LC[i] == 'Present') {
    log.r.data$LC.bin[i] <- 1
  }
  else {
    log.r.data$LC.bin[i] <- 0
  }
}

Although we ran a model with multiple predictor variables, it can be helpful to visualize the predicted likelihood of legacy C presence by each separate predictor variable.

Therefore, we have to generate a new model for each variable.

glm.sol.depth <- glm(data= log.r.data, formula = LC ~ prefire_sol_depth, family = 'binomial')
summary(glm.sol.depth)
## 
## Call:
## glm(formula = LC ~ prefire_sol_depth, family = "binomial", data = log.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5716  -1.1258   0.4911   1.0939   1.3937  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)  
## (Intercept)       -1.09037    0.88393  -1.234   0.2174  
## prefire_sol_depth  0.06993    0.04077   1.715   0.0864 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.230  on 31  degrees of freedom
## Residual deviance: 39.342  on 30  degrees of freedom
## AIC: 43.342
## 
## Number of Fisher Scoring iterations: 4
glm.stand.age <- glm(data = log.r.data, formula = LC ~ stand_age_rings, family = 'binomial')
summary(glm.stand.age)
## 
## Call:
## glm(formula = LC ~ stand_age_rings, family = "binomial", data = log.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4450  -1.2695   0.7322   0.9323   1.5312  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)  
## (Intercept)      1.650387   0.916089   1.802   0.0716 .
## stand_age_rings -0.012385   0.008052  -1.538   0.1240  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.230  on 31  degrees of freedom
## Residual deviance: 40.635  on 30  degrees of freedom
## AIC: 44.635
## 
## Number of Fisher Scoring iterations: 4

Then we have to generate a new range of values of our predictor variables for which we will predict LC presence.

new.sol.depth <- as.data.frame(seq(min(log.r.data$prefire_sol_depth),max(log.r.data$prefire_sol_depth), length.out = 32))
names(new.sol.depth) <- 'prefire_sol_depth' #must rename the column so the predict() function can match the model to the new data 
new.stand.age <- as.data.frame(seq(min(log.r.data$stand_age_rings),max(log.r.data$stand_age_rings), length.out = 32))
names(new.stand.age) <- 'stand_age_rings'

We can now predict the LC presence probabilities.

prediction.sol.depth <- cbind(new.sol.depth,predict(glm.sol.depth, newdata = new.sol.depth, type = 'link', se = TRUE))
prediction.stand.age <- cbind(new.stand.age ,predict(glm.stand.age, newdata = new.stand.age, type = 'link', se = TRUE))

and generate the upper and lower confidence intervals for each prediction (using 1.96 because it is the approx critical value for 95% CI). Since we argued the response type should be in the link scale we must back transform the CIs (and the fit when plotting) to plot the values on the appropriate response scale.

prediction.sol.depth$LL <- rev.logit(prediction.sol.depth$fit - 1.96 * prediction.sol.depth$se.fit)
prediction.sol.depth$UL <- rev.logit(prediction.sol.depth$fit + 1.96 * prediction.sol.depth$se.fit)
head(prediction.sol.depth)
##   prefire_sol_depth        fit    se.fit residual.scale        LL        UL
## 1          6.340000 -0.6470421 0.6602462              1 0.1255244 0.6563432
## 2          7.799355 -0.5449953 0.6126501              1 0.1485791 0.6583160
## 3          9.258710 -0.4429484 0.5673067              1 0.1743841 0.6612731
## 4         10.718065 -0.3409016 0.5248001              1 0.2026991 0.6654565
## 5         12.177419 -0.2388548 0.4858755              1 0.2330496 0.6711655
## 6         13.636774 -0.1368080 0.4514602              1 0.2647022 0.6787545
prediction.stand.age$LL <- rev.logit(prediction.stand.age$fit - 1.96 * prediction.stand.age$se.fit)
prediction.stand.age$UL <- rev.logit(prediction.stand.age$fit + 1.96 * prediction.stand.age$se.fit)
head(prediction.stand.age)
##   stand_age_rings      fit    se.fit residual.scale        LL        UL
## 1              19 1.415075 0.7790426              1 0.4720670 0.9498840
## 2              25 1.340766 0.7370696              1 0.4740507 0.9418830
## 3              31 1.266457 0.6959197              1 0.4756329 0.9327963
## 4              37 1.192148 0.6557479              1 0.4767373 0.9225432
## 5              43 1.117839 0.6167454              1 0.4772701 0.9110610
## 6              49 1.043529 0.5791484              1 0.4771156 0.8983168

Finally, plot the predicted probabilities and their 95% CI alongside the actual data

Figure 3.a and b from original paper

p <- ggplot(data = prediction.sol.depth, aes(x= prefire_sol_depth, y = rev.logit(fit))) + geom_line() + geom_ribbon(data = prediction.sol.depth, aes(ymin = LL, ymax = UL), alpha = .2) + labs(x = "Prefire SOL depth (cm)", y ="Pr(Present)", color = 'Moisture Class', shape = 'Stand Age Class') + geom_point(data = log.r.data, aes(x = prefire_sol_depth, y = LC.bin, color = moisture_class, shape = stand_age_class), size = 3.5) + theme_bw() + scale_color_manual(values = c('darkgreen', 'darkred','darkblue')) + ylim(0,1)
p

Caption: Predicted probability of Legacy C presence for prefire SOL depth. Actual data points are grouped by stand age class and moisture class. The ribbons represent the 95% confidence interval for the predicted values

p <- ggplot(data = prediction.stand.age, aes(x= stand_age_rings, y = rev.logit(fit))) + geom_line() + geom_ribbon(data = prediction.stand.age, aes(ymin = LL, ymax = UL), alpha = .2) + labs(x = "Stand Age (yr)", y ="Pr(Present)", color = 'Moisture Class', shape = 'Stand Age Class') + geom_point(data = log.r.data, aes(x = stand_age_rings, y = LC.bin, color = moisture_class, shape = stand_age_class), size = 3.5) + theme_bw() + scale_color_manual(values = c('darkgreen', 'darkred','darkblue')) + ylim(0,1) + geom_vline(aes(xintercept = 60), linetype = 'dashed')
p

Caption: Predicted probability of Legacy C presence for stand age. Actual data points are grouped by stand age class and moisture class. The ribbons represent the 95% confidence interval for the predicted values. The dashed line shows the cutoff for young or old stand age classification.

Soil surface dating relative to bomb peak

Here we will determine the surface soil establishment date relative to bomb peak by comparing \(\Delta^{14}C\) levels at the soil surface to \(\Delta^{14}C\) levels at soil depth increment of 2 cm. In natural states, depth should have less C because it is older (more C has decayed & surface layers accumulate vertically). Therefore, if C levels are higher at the surface than at the 2cm depth then the surface is dated as pre bomb peak. If the soil surface has less C than the 2cm depth, the surface layer is determined to be established post bomb peak. These classifications will be important for when we determine legacy C combustion later in the analysis.

surface.layer <- soil.data[soil.data$increment_location == 'TOP',] #Extract surface layer measurements 
surface.layer$soil_depth_increment <- as.factor(surface.layer$soil_depth_increment)

Assigning stand age class (young; < 60 years or old > 70 years) based on the tree ring dates.

surface.layer$Age <- NULL
for (i in 1:nrow(surface.layer)) {
  if (surface.layer$stand_age_rings[i] < 60) {
    surface.layer$Age[i] <- 'Young'
  }
  else {
    surface.layer$Age[i] <- 'Old'
  }
}

Data Visualization

We can plot the \(\Delta^{14}C\) at both soil increments (1 and 2 cm) to determine the relative soil surface dates. We can plot the pre-bomb and post-bomb sites separately to show the relationship of \(\Delta^{14}C\) between both soil increments more clearly.

#First, divide into pre and pot bomb peak data. 
surface.layer.pre <- surface.layer[surface.layer$soil_class_pre.post_bomb == 'pre', ]
surface.layer.post <- surface.layer[surface.layer$soil_class_pre.post_bomb == 'post', ]

Figure 2.d and e from original paper

ggplot(surface.layer.pre, aes(x = soil_depth_increment, y = Delta14C)) + geom_point(aes(shape = soil_depth_increment, color = soil_depth_increment), size = 3) + geom_line(aes(group = site_sample_transect, linetype = Age)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = 'Soil Depth', y = 'Delta 14 Carbon (parts per thousand)', color = 'Soil Depth', shape = 'Soil Depth') + scale_x_discrete(labels = c('Surface','2 cm'), breaks = c(1,2))

Caption: Surface \(\Delta^{14}C\) (green circles) compared to 2cm depth \(\Delta^{14}C\) (red triangles) in sites where the soil surface was dated pre-bomb peak (pre 1966). Dashed lines represent young sites. Solid lines represent old sites.

ggplot(surface.layer.post, aes(x = soil_depth_increment, y = Delta14C)) + geom_point(aes(shape = soil_depth_increment, color = soil_depth_increment), size = 3) + geom_line(aes(group = site_sample_transect, linetype = Age)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = 'Soil Depth', y = 'Delta 14 Carbon (parts per thousand)', color = 'Soil Depth', shape = 'Soil Depth') + scale_x_discrete(labels = c('Surface','2 cm'), breaks = c(1,2))

Caption: Surface \(\Delta^{14}C\) (green circles) compared to 2cm depth \(\Delta^{14}C\) (red triangles) in sites where the soil surface was dated post-bomb peak (post 1966). Dashed lines represent young sites. Solid lines represent old sites.

Combustion of Legacy C

To assess if legacy C was combusted during the burning event the \(\Delta^{14}C\) values of the soil surface samples were compared to atmospheric \(\Delta^{14}CO_2\) values at the time of stand establishment (only in plots where Legacy C was considered present pre burn). Legacy C was considered combusted if the soil surface was older than the stand age.

Therefore legacy C was combusted in the plot IF:

  1. Soil surface C levels were lower than atmospheric stand age \(CO_2\) levels in sites where the stand age was pre bomb peak AND the soil surface was pre bomb peak.

  2. The stand age was dated post bomb peak AND the soil surface was dated pre bomb peak

  3. The soil surface C levels were higher than atmospheric stand age \(CO_2\) levels in sites where the stand age was post bomb peak AND the soil surface was post bomb peak.

Note: legacy C was not combusted if the stand age was pre bomb peak and the surface was post bomb peak.

First we can extract the soil surface \(\Delta^{14}C\) values.

top.surface.layer<- surface.layer[surface.layer$soil_depth_increment == 1,]
top.surface.layer <- top.surface.layer[duplicated(top.surface.layer$burned_site_plot) == FALSE,] 

Then extract plots where Legacy C was present.

top.surface.layer$LP <- basal.layer$LC
legacy.plots <- top.surface.layer[top.surface.layer$LP == 'Present',]

Then we must identify and label combustion status for each plot.

legacy.plots$LC <- NULL 
for (i in 1:nrow(legacy.plots)) {
  if (legacy.plots$stand_age_pre.post_bomb[i] == 'pre' & legacy.plots$soil_class_pre.post_bomb[i] == 'pre'){
    if (legacy.plots$Delta14C[i] < legacy.plots$X14C_yr_stand_age[i]){
      legacy.plots$LC[i] <- 'Combusted'
    }
    else {
      legacy.plots$LC[i] <- 'Not.Combusted'
    }
  }
  else if (legacy.plots$stand_age_pre.post_bomb[i] == 'post' & legacy.plots$soil_class_pre.post_bomb[i] == 'pre') {
      legacy.plots$LC[i] <- 'Combusted'
    }
  else if (legacy.plots$stand_age_pre.post_bomb[i] == 'pre' & legacy.plots$soil_class_pre.post_bomb[i] == 'post') {
      legacy.plots$LC[i] <- 'Not.Combusted'
    }
  else if (legacy.plots$stand_age_pre.post_bomb[i] == 'post' & legacy.plots$soil_class_pre.post_bomb[i] == 'post') {
       if (legacy.plots$Delta14C[i] < legacy.plots$X14C_yr_stand_age[i]){
      legacy.plots$LC[i] <- 'Not.Combusted'
    }
    else {
      legacy.plots$LC[i] <- 'Combusted'
    }
  }
}

Data Visualization

First we must manipulate the data for the ggplot object. Again combining our C measurements and assigning them a class.

surface.carbon <- legacy.plots[,-17]
surface.carbon$Location <- 'Soil.Surface'
colnames(surface.carbon)[12] <- 'Delta14C'
surface.stand.age <- legacy.plots[,-12]
surface.stand.age$Location <- 'Stand.Age'
colnames(surface.stand.age)[16] <- 'Delta14C'

legacy.combustion <- rbind(surface.carbon, surface.stand.age)

Then we can plot the soil surface C levels against the atmospheric \(CO_2\) at the time of stand establishment for all of the different site types.

#subset the legacy plot data set by site type. 
legacy.combustion.pre.pre <- legacy.combustion[legacy.combustion$stand_age_pre.post_bomb == 'pre' & legacy.combustion$soil_class_pre.post_bomb == 'pre',]
legacy.combustion.post.pre <- legacy.combustion[legacy.combustion$stand_age_pre.post_bomb == 'post' & legacy.combustion$soil_class_pre.post_bomb == 'pre',]
legacy.combustion.pre.post <- legacy.combustion[legacy.combustion$stand_age_pre.post_bomb == 'pre' & legacy.combustion$soil_class_pre.post_bomb == 'post',]
legacy.combustion.post.post <- legacy.combustion[legacy.combustion$stand_age_pre.post_bomb == 'post' & legacy.combustion$soil_class_pre.post_bomb == 'post',]

Figure 2.f-i from original paper

ggplot(legacy.combustion.pre.pre, aes(x= Location, y = Delta14C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = site_sample_transect, linetype = LC)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') + scale_linetype_manual(values = c('dashed', 'solid'))

Caption: Surface \(\Delta^{14}C\) (green circles) compared to atmospheric \(\Delta^{14}CO_2\) (red triangles) at the time of stand establishment in sites that originally contained legacy C, the stand was dated pre-bomb peak and the soil surface was dated pre-bomb peak. Dashed lines represent sites where Legacy C was combusted. Solid lines represent sites where Legacy C was not combusted.

ggplot(legacy.combustion.post.pre, aes(x= Location, y = Delta14C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = site_sample_transect, linetype = LC)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') + scale_linetype_manual(values = c('dashed', 'solid'))

Caption: Surface \(\Delta^{14}C\) (green circles) compared to atmospheric \(\Delta^{14}CO_2\) (red triangles) at the time of stand establishment in sites that originally contained legacy C, the stand was dated post-bomb peak and the soil surface was dated pre-bomb peak. Dashed lines represent sites where Legacy C was combusted. Solid lines represent sites where Legacy C was not combusted.

ggplot(legacy.combustion.pre.post, aes(x= Location, y = Delta14C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = site_sample_transect, linetype = LC)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') + scale_linetype_manual(values = c('dashed', 'solid'))

Caption: Surface \(\Delta^{14}C\) (green circles) compared to atmospheric \(\Delta^{14}CO_2\) (red triangles) at the time of stand establishment in sites that originally contained legacy C, the stand was dated pre-bomb peak and the soil surface was dated post-bomb peak. Dashed lines represent sites where Legacy C was combusted. Solid lines represent sites where Legacy C was not combusted.

ggplot(legacy.combustion.post.post, aes(x= Location, y = Delta14C)) + geom_point(aes(shape = Location, color = Location), size = 3) + geom_line(aes(group = site_sample_transect, linetype = LC)) + theme_bw() + scale_color_manual(values = c('darkgreen','darkred')) + labs(x = NULL, y = 'Delta 14 Carbon (parts per thousand)', linetype = 'Legacy Carbon') 

Caption: Surface \(\Delta^{14}C\) (green circles) compared to atmospheric \(\Delta^{14}CO_2\) (red triangles) at the time of stand establishment in sites that originally contained legacy C, the stand was dated post-bomb peak and the soil surface was dated post-bomb peak. Dashed lines represent sites where Legacy C was combusted. Solid lines represent sites where Legacy C was not combusted.

Similar to when we were assessing legacy C presence or absence, we can look at the combustion status relationship with our expected predictor variables.

First we must extract the appropriate plots (where Legacy C was present)

combustion.r.data <- log.r.data[log.r.data$LC == 'Present',]
combustion.r.data$combustion <- legacy.plots$LC #add the combustion class to the new logistic regression data
ggplot(data = combustion.r.data, aes(x = combustion, y = prefire_sol_depth)) + geom_boxplot(aes(color = combustion)) + theme_bw() + labs(title = 'Combustion v. Prefire SOL depth ', x = 'Legacy Carbon', y = 'Prefire SOL depth (cm)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none")

Caption: Legacy C (combusted = green and not combusted = red) plotted against prefire SOL depth. The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range).

ggplot(data = combustion.r.data, aes(x = combustion, y = stand_age_rings)) + geom_boxplot(aes(color = combustion)) + theme_bw() + labs(title = 'Combustion v. Stand Age ', x = 'Legacy Carbon', y = 'Stand Age (yr)') + scale_color_manual(values = c('darkgreen', 'darkred')) + theme(legend.position="none") + geom_hline(aes(yintercept = 60), linetype = 'dashed')

Caption: Legacy C (combusted = green and not combusted = red) plotted against stand age. The boxplot displays the median. The two hinges represent the first and third quartiles (25th and 75th percentiles). The lower and upper whiskers extend from the smallest and largest values respectively to no further than 1.5 * IQR (inter-quartile range). The horizontal dashed line at 60 years represents the cutoff for stand age class (young/old)

From both of these plots we can see no evidence for combustion in old sites, and drier and younger sites combusted more frequently.

Multiple Logistic Regression

To test our hypotheses for Legacy C combustion, we again must use a generalized linear model using the log link function because our response variable of combustion status is also binary. In this case our model will look at the predictor variables of stand age and the proportion of the soil organic layer burned (as a proxy for fire severity)

So first we must factor our response variable and calculate the proportion of the soil organic layer burned from the burn depth and the prefire sol depth.

combustion.r.data$combustion <- factor(combustion.r.data$combustion, levels = c('Not.Combusted', 'Combusted'))
levels(combustion.r.data$combustion) #important to factor in the correct order for glm model interpretation
## [1] "Not.Combusted" "Combusted"
combustion.r.data$proportion_sol_burned <- combustion.r.data$burn_depth/combustion.r.data$prefire_sol_depth

Then we can move onto model selection

Model Selection

Again we are creating a full model that includes the predictors interaction term, a reduced model without an interaction term and a null model (intercept only model)

full.combustion <- glm(data = combustion.r.data, formula = combustion ~ proportion_sol_burned * stand_age_rings, family = binomial)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(full.combustion) 
## 
## Call:
## glm(formula = combustion ~ proportion_sol_burned * stand_age_rings, 
##     family = binomial, data = combustion.r.data)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.26155  -0.07383   0.00000   0.01511   1.67247  
## 
## Coefficients:
##                                       Estimate Std. Error z value Pr(>|z|)
## (Intercept)                            20.0540    16.9873   1.181    0.238
## proportion_sol_burned                 -13.5919    19.1339  -0.710    0.477
## stand_age_rings                        -0.6923     0.5185  -1.335    0.182
## proportion_sol_burned:stand_age_rings   0.7773     0.6005   1.294    0.196
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 21.9007  on 18  degrees of freedom
## Residual deviance:  7.5533  on 15  degrees of freedom
## AIC: 15.553
## 
## Number of Fisher Scoring iterations: 10
reduced.combustion <- glm(data = combustion.r.data, formula = combustion ~ proportion_sol_burned + stand_age_rings, family = binomial)
summary(reduced.combustion) 
## 
## Call:
## glm(formula = combustion ~ proportion_sol_burned + stand_age_rings, 
##     family = binomial, data = combustion.r.data)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.39200  -0.48671  -0.03575   0.11308   1.79539  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)  
## (Intercept)           -3.78147    2.99380  -1.263   0.2066  
## proportion_sol_burned 12.52782    7.38223   1.697   0.0897 .
## stand_age_rings       -0.05758    0.03494  -1.648   0.0993 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 21.901  on 18  degrees of freedom
## Residual deviance: 10.708  on 16  degrees of freedom
## AIC: 16.708
## 
## Number of Fisher Scoring iterations: 7
null.combustion <- glm(data = combustion.r.data, formula = combustion ~ 1, family = binomial)
summary(null.combustion) 
## 
## Call:
## glm(formula = combustion ~ 1, family = binomial, data = combustion.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7815  -0.7815  -0.7815   0.4263   1.6340  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   -1.030      0.521  -1.976   0.0481 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 21.901  on 18  degrees of freedom
## Residual deviance: 21.901  on 18  degrees of freedom
## AIC: 23.901
## 
## Number of Fisher Scoring iterations: 4

And compare the models using a log-likelihood test.

anova(reduced.combustion, full.combustion, test = 'Chisq') 
## Analysis of Deviance Table
## 
## Model 1: combustion ~ proportion_sol_burned + stand_age_rings
## Model 2: combustion ~ proportion_sol_burned * stand_age_rings
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1        16    10.7085                       
## 2        15     7.5533  1   3.1552  0.07569 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here the chi-squared p value is not significant so the inclusion of the interaction term does not significantly decrease deviance and we can resume with the reduced model.

We can also look at the results of lrtest() from the {lmtest} package

lrtest(reduced.combustion, full.combustion) 
## Likelihood ratio test
## 
## Model 1: combustion ~ proportion_sol_burned + stand_age_rings
## Model 2: combustion ~ proportion_sol_burned * stand_age_rings
##   #Df  LogLik Df  Chisq Pr(>Chisq)  
## 1   3 -5.3542                       
## 2   4 -3.7767  1 3.1552    0.07569 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here the higher log likelihood value indicates a greater model fit, but again the chi-squared p value is not significant so even though the full model has a ‘better’ fit it is not significantly better. Therefore, again we can proceed with utilizing the reduced model for our analysis.

Next, we must compare the accepted model(the reduced model) against the null model

anova(null.combustion, reduced.combustion, test = 'Chisq') 
## Analysis of Deviance Table
## 
## Model 1: combustion ~ 1
## Model 2: combustion ~ proportion_sol_burned + stand_age_rings
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
## 1        18     21.901                        
## 2        16     10.709  2   11.192 0.003712 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here the chi-squared p value is less than alpha (.05) meaning the fuller model is associated with less residual deviance and therefore is a better fit! We can reject the null hypothesis that the removal of the predictor variables does not result in a loss of fit and proceed with utilizing the reduced model for our analysis.

Interpretation

Our slope estimate (\(\beta1\)) again represents the associated change in the response variable (which if we remember is the natural log of the odds ratio) with an increase in the predictor variable of 1 unit.

The coefficient/estimate for the proportion of the soil organic layer was positive. Therefore increasing proportions of the SOL burned results in an increased likelihood of legacy C being combusted. For every one unit of proportion of SOL burned, the probability of legacy C being combusted increases by 12.52782. Where the estimate for stand age is negative indicating every year the stand ages, the likelihood of legacy C being combusted decreases by 0.05758. Even though neither of these relationships showed a significant p-value.

In order to get the actual odds ratio we must back transform the estimates using the reverse of the link function.

prop.sol.burned.change <- rev.logit(reduced.combustion$coefficients[2]) 
stand.age.change <- rev.logit(reduced.combustion$coefficients[3])

prop.sol.burned.change
## proportion_sol_burned 
##             0.9999964
stand.age.change
## stand_age_rings 
##       0.4856079

1 unit increase in proportion of SOL burned results in a 1% decrease in the likelihood of Legacy C combustion 1 unit increase in stand age results in a 51% decrease in the likelihood of Legacy C combustion

Hypothesis Testing

To test our null hypothesis that the response variable (Legacy C combustion) and our predictor variables have no relationship, we can calculate the Wald statistic (similar to a t-statistic) for each predictor variable and compare them to the z distribution.

#Calculating the Wald Statistic for proportion of SOL burned 
modelresults <- tidy(reduced.combustion)
wald <- modelresults$estimate[2]/modelresults$std.error[2]
p <- 2 * (1 - pnorm(wald))  # calculation of 2 tailed p value associated with the Wald statistic
p
## [1] 0.08969244
#for stand age
modelresults <- tidy(reduced.combustion)
wald <- modelresults$estimate[3]/modelresults$std.error[3]
p <- 2 * (1 - pnorm(wald))  # calculation of 2 tailed p value associated with the Wald statistic
p
## [1] 1.900684

Again neither predictor variable seems to show a significant p-value.

Confidence Intervals

Finally, we can calculate the confidence intervals around our estimates

CI <- confint.default(reduced.combustion, level = 0.95) # this function returns CIs based on standard errors instead of based on log-likelihood 
CI
##                            2.5 %      97.5 %
## (Intercept)           -9.6492143  2.08628045
## proportion_sol_burned -1.9410968 26.99672747
## stand_age_rings       -0.1260621  0.01089314
#Calculating the CIs for the actual odds ratios
prop.sol.burned.changeCI <- rev.logit(CI[2, ]) #for prefire SOL depth
stand.age.changeCI <- rev.logit(CI[3, ]) #for stand age 

prop.sol.burned.changeCI
##     2.5 %    97.5 % 
## 0.1255274 1.0000000
stand.age.changeCI
##     2.5 %    97.5 % 
## 0.4685261 0.5027233

Model Predictions

Transform the factored combustion variable into a binary value for the ggplot object

combustion.r.data$combust.bin <- NULL 
for (i in 1:nrow(combustion.r.data)) {
  if (combustion.r.data$combustion[i] == 'Combusted') {
    combustion.r.data$combust.bin[i] <- 1
  }
  else {
    combustion.r.data$combust.bin[i] <- 0
  }
}

To visualize the predicted likelihood of legacy C combustion by each separate predictor variable we must first generate a new model for each variable.

glm.sol.burned <- glm(data= combustion.r.data, formula = combustion ~ proportion_sol_burned, family = 'binomial')
summary(glm.sol.burned)
## 
## Call:
## glm(formula = combustion ~ proportion_sol_burned, family = "binomial", 
##     data = combustion.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4942  -0.6229  -0.3927   0.3403   2.0983  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)  
## (Intercept)             -4.327      2.059  -2.101   0.0356 *
## proportion_sol_burned    6.293      3.454   1.822   0.0685 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 21.901  on 18  degrees of freedom
## Residual deviance: 17.498  on 17  degrees of freedom
## AIC: 21.498
## 
## Number of Fisher Scoring iterations: 5
glm.stand.age <- glm(data = combustion.r.data, formula = combustion ~ stand_age_rings, family = 'binomial')
summary(glm.stand.age)
## 
## Call:
## glm(formula = combustion ~ stand_age_rings, family = "binomial", 
##     data = combustion.r.data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1644  -0.8727  -0.3323   0.4042   2.2285  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)
## (Intercept)      1.23090    1.30256   0.945    0.345
## stand_age_rings -0.03154    0.01979  -1.594    0.111
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 21.901  on 18  degrees of freedom
## Residual deviance: 17.318  on 17  degrees of freedom
## AIC: 21.318
## 
## Number of Fisher Scoring iterations: 6

Then generate a new range of values of our predictor variables for which we will predict LC combustion.

new.sol.burned <- as.data.frame(seq(min(combustion.r.data$proportion_sol_burned),max(combustion.r.data$proportion_sol_burned), length.out = 32))
names(new.sol.burned) <- 'proportion_sol_burned' 
new.stand.age <- as.data.frame(seq(min(combustion.r.data$stand_age_rings),max(combustion.r.data$stand_age_rings), length.out = 32))
names(new.stand.age) <- 'stand_age_rings'

Finally we can predict the combustion status for the generated predictor variables.

prediction.sol.burned <- cbind(new.sol.burned,predict(glm.sol.burned, newdata = new.sol.burned, type = 'link', se = TRUE))
prediction.stand.age <- cbind(new.stand.age ,predict(glm.stand.age, newdata = new.stand.age, type = 'link', se = TRUE))

and generate the upper and lower confidence intervals for the fitted values.

prediction.sol.burned$LL <- rev.logit(prediction.sol.burned$fit - 1.96 * prediction.sol.burned$se.fit)
prediction.sol.burned$UL <- rev.logit(prediction.sol.burned$fit + 1.96 * prediction.sol.burned$se.fit)
head(prediction.sol.burned)
##   proportion_sol_burned       fit   se.fit residual.scale          LL        UL
## 1             0.1799147 -3.194867 1.474844              1 0.002270320 0.4245377
## 2             0.1999821 -3.068592 1.411628              1 0.002913786 0.4251174
## 3             0.2200494 -2.942317 1.349013              1 0.003734564 0.4259849
## 4             0.2401168 -2.816042 1.287085              1 0.004779021 0.4271827
## 5             0.2601841 -2.689767 1.225949              1 0.006104368 0.4287614
## 6             0.2802515 -2.563492 1.165730              1 0.007780511 0.4307819
prediction.stand.age$LL <- rev.logit(prediction.stand.age$fit - 1.96 * prediction.stand.age$se.fit)
prediction.stand.age$UL <- rev.logit(prediction.stand.age$fit + 1.96 * prediction.stand.age$se.fit)
head(prediction.stand.age)
##   stand_age_rings         fit    se.fit residual.scale        LL        UL
## 1        19.00000  0.63168865 0.9816617              1 0.2154522 0.9279586
## 2        24.77419  0.44958614 0.8927017              1 0.2141467 0.9001850
## 3        30.54839  0.26748363 0.8101150              1 0.2107615 0.8647492
## 4        36.32258  0.08538112 0.7360500              1 0.2046832 0.8217192
## 5        42.09677 -0.09672139 0.6733248              1 0.1952206 0.7725902
## 6        47.87097 -0.27882389 0.6253610              1 0.1817506 0.7204880

Finally we can plot the predicted values and their 95% CI alongside the actual data

Figure 3.c and d from original paper

p <- ggplot(data = prediction.sol.burned, aes(x= proportion_sol_burned, y = rev.logit(fit))) + geom_line() + geom_ribbon(data = prediction.sol.burned, aes(ymin = LL, ymax = UL), alpha = .2) + labs(x = "Proportion SOL burned", y ="Pr(Combustion)", color = 'Moisture Class', shape = 'Stand Age Class') + geom_point(data = combustion.r.data, aes(x = proportion_sol_burned, y = combust.bin, color = moisture_class, shape = stand_age_class), size = 3.5) + theme_bw() + scale_color_manual(values = c('darkgreen', 'darkred','darkblue')) + ylim(0,1)
p

Caption: Predicted probability of Legacy C combustion for the proportion of SOL burned. Actual data points are grouped by stand age class and moisture class. The ribbons represent the 95% confidence interval for the predicted values.

p <- ggplot(data = prediction.stand.age, aes(x= stand_age_rings, y = rev.logit(fit))) + geom_line() + geom_ribbon(data = prediction.stand.age, aes(ymin = LL, ymax = UL), alpha = .2) + labs(x = "Stand Age (yr)", y ="Pr(Combustion)", color = 'Moisture Class', shape = 'Stand Age Class') + geom_point(data = combustion.r.data, aes(x = stand_age_rings, y = combust.bin, color = moisture_class, shape = stand_age_class), size = 3.5) + theme_bw() + scale_color_manual(values = c('darkgreen', 'darkred','darkblue')) + ylim(0,1) + geom_vline(aes(xintercept = 60), linetype = 'dashed')
p

Caption: Predicted probability of Legacy C combustion for stand age. Actual data points are grouped by stand age class and moisture class. The ribbons represent the 95% confidence interval for the predicted values. The dashed line represents the cutoff (60 years) for old vs young stand age classification.

Conclusion

So overall I was able to reproduce the exact figures and reproduce the trends similarly to the original data analysis. As stand age increased, the likelihood of legacy C presence and combustion both decreased. Legacy C combustion was more likely as the proportion of the SOL burned and the presence of legacy C became more likely as the prefire SOL depth increased. These trends indicate that more frequent fire intervals and increased fire severity under climate warming are threats to boreal forests ability to act as C sinks and counteract some of the C emissions contributing to greenhouse gas formation.

Where I strayed was in the statistical significance of my generalized linear model results. As stated in the introduction, the original authors were able to obtain significant results for all of their analyses. They did indicate that they used Firth’s logistic regression using the {logistf} package instead of the standard glm() function in the {lmtest} package to account for highly predictive covariates that are common in generalized linear models using smaller sample sizes. Yet when I tried running the analysis under that package the p-values still showed no significance.

A note on the availability and accessibility of the data and the replication analysis:

Overall I would say the paper and the data was very accessible. The data for this paper was on a very accessibly databaase (DAAC) and the files came with a data user guide including a dataset overview and summary beyond what was outlined in the published paper. The only part I ran into was running the logistic regression as they stated they used Firth’s logistic regression which I attempted and was not able to mimic their results, possibly due to them not giving enough detail about the parameters they were using. Although, the authors do specify in the concluding remarks that they are willing to share their code with anyone who contacts them directly by email. Therefore if I had been inclined to resolve this problem the exact code would have been made available to me.